thr3ads.net - llvm dev - [LLVMdev] Supporting heterogeneous computing in llvm. [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Sergey Ostanevich

2015-Jun-08 20:46 UTC

[LLVMdev] Supporting heterogeneous computing in llvm.

Roel,

You have to checkout and build llvm/clang as usual.
For runtime support you'll have to build the libomptarget and make a
plugin for your target. Samuel can help you some more.
As for the OpenMP examples I can recommend you the
http://openmp.org/mp-documents/OpenMP4.0.0.Examples.pdf
look into the target constructs.

Sergos


On Mon, Jun 8, 2015 at 6:13 PM, Roel Jordans <r.jordans at tue.nl>
wrote:> Hi Sergos,
>
> I'd like to try this on our hardware.  Is there some example code that
I
> could use to get started?
>
> Cheers,
>  Roel
>
>
> On 08/06/15 13:27, Sergey Ostanevich wrote:
>>
>> Chirs,
>>
>> Have you seen an offloading infrastructure design proposal at
>> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084986.html ?
>> It relies on the long-standing OpenMP standard with recent updates to
>> support the heterogenous computations.
>> Could you please review it and comment on how it fits to your needs?
>>
>> It's not quite clear from your proposal what source language
standard
>> do you plat to support - you just metion that OpenCL will be one of
>> your backends, as far as I got it. What's your plan on sources -
>> C/C++/FORTRAN?
>> How would you control the offloading, data transfer, scheduling and so
>> on? Whether it will be new language constructs, similar to prallel_for
>> in Cilk Plus, or will it be pragma-based like in OpenMP or OpenACC?
>>
>> The design I mentioned above has an operable implementation fon NVIDIA
>> target at the
>>
>> https://github.com/clang-omp/llvm_trunk
>> https://github.com/clang-omp/clang_trunk
>>
>> with runtime implemented at
>>
>> https://github.com/clang-omp/libomptarget
>>
>> you're welcome to try it out, if you have an appropriate device.
>>
>> Regards,
>> Sergos
>>
>> On Sat, Jun 6, 2015 at 2:24 PM, Christos Margiolas
>> <chrmargiolas at gmail.com> wrote:
>>>
>>> Hello,
>>>
>>> Thank you a lot for the feedback. I believe that the heterogeneous
engine
>>> should be strongly connected with parallelization and vectorization
>>> efforts.
>>> Most of the accelerators are parallel architectures where having
>>> efficient
>>> parallelization and vectorization can be critical for performance.
>>>
>>> I am interested in these efforts and I hope that my code can help
you
>>> managing the offloading operations. Your LLVM instruction set
extensions
>>> may
>>> require some changes in the analysis code but I think is going to
be
>>> straightforward.
>>>
>>> I am planning to push my code on phabricator in the next days.
>>>
>>> thanks,
>>> Chris
>>>
>>>
>>> On Fri, Jun 5, 2015 at 3:45 AM, Adve, Vikram Sadanand
>>> <vadve at illinois.edu>
>>> wrote:
>>>>
>>>>
>>>> Christos,
>>>>
>>>> We would be very interested in learning more about this.
>>>>
>>>> In my group, we (Prakalp Srivastava, Maria Kotsifakou and I)
have been
>>>> working on LLVM extensions to make it easier to target a wide
range of
>>>> accelerators in a heterogeneous mobile device, such as
Qualcomm's
>>>> Snapdragon
>>>> and other APUs.  Our approach has been to (a) add better
abstractions of
>>>> parallelism to the LLVM instruction set that can be mapped down
to a
>>>> wide
>>>> range of parallel hardware accelerators; and (b) to develop
optimizing
>>>> "back-end" translators to generate efficient code for
the accelerators
>>>> from
>>>> the extended IR.
>>>>
>>>> So far, we have been targeting GPUs and vector hardware, but
semi-custom
>>>> (programmable) accelerators are our next goal.  We have
discussed DSPs
>>>> as a
>>>> valuable potential goal as well.
>>>>
>>>> Judging from the brief information here, I'm guessing that
our projects
>>>> have been quite complementary.  We have not worked on the
extraction
>>>> passes,
>>>> scheduling, or other run-time components you mention and would
be happy
>>>> to
>>>> use an existing solution for those.  Our hope is that the IR
extensions
>>>> and
>>>> translators will give your schedulers greater flexibility to
retarget
>>>> the
>>>> extracted code components to different accelerators.
>>>>
>>>> --Vikram S. Adve
>>>> Visiting Professor, School of Computer and Communication
Sciences, EPFL
>>>> Professor, Department of Computer Science
>>>> University of Illinois at Urbana-Champaign
>>>> vadve at illinois.edu
>>>> http://llvm.org
>>>>
>>>>
>>>>
>>>>
>>>> On Jun 5, 2015, at 3:18 AM, llvmdev-request at cs.uiuc.edu
wrote:
>>>>
>>>>> Date: Thu, 4 Jun 2015 17:35:25 -0700
>>>>> From: Christos Margiolas <chrmargiolas at gmail.com>
>>>>> To: LLVM Developers Mailing List <llvmdev at
cs.uiuc.edu>
>>>>> Subject: [LLVMdev] Supporting heterogeneous computing in
llvm.
>>>>> Message-ID:
>>>>>
>>>>> <CAC3KUCx0mpBrnrGjDVxQzxtBpnJXtw3herZ_E2pQoSqSyMNsKA at
mail.gmail.com>
>>>>> Content-Type: text/plain; charset="utf-8"
>>>>>
>>>>> Hello All,
>>>>>
>>>>> The last two months I have been working on the design and
>>>>> implementation
>>>>> of
>>>>> a heterogeneous execution engine for LLVM. I started this
project as an
>>>>> intern at the Qualcomm Innovation Center and I believe it
can be useful
>>>>> to
>>>>> different people and use cases. I am planning to share more
details and
>>>>> a
>>>>> set of patches in the next
>>>>> days. However, I would first like to see if there is an
interest for
>>>>> this.
>>>>>
>>>>> The project is about providing compiler and runtime support
for the
>>>>> automatic and transparent offloading of loop or function
workloads to
>>>>> accelerators.
>>>>>
>>>>> It is composed of the following:
>>>>> a) Compiler and Transformation Passes for extracting loops
or functions
>>>>> for
>>>>> offloading.
>>>>> b) A runtime library that handles scheduling, data sharing
and
>>>>> coherency
>>>>> between the
>>>>> host and accelerator sides.
>>>>> c) A modular codebase and design. Adaptors specialize the
code
>>>>> transformations for the target accelerators. Runtime
plugins manage the
>>>>> interaction with the different accelerator environments.
>>>>>
>>>>> So far, this work so far supports the Qualcomm DSP
accelerator  but I
>>>>> am
>>>>> planning to extend it to support OpenCL accelerators. I
have also
>>>>> developed
>>>>> a debug port where I can test the passes and the runtime
without
>>>>> requiring
>>>>> an accelerator.
>>>>>
>>>>>
>>>>> The project is still in early R&D stage and I am
looking forward for
>>>>> feedback and to gauge  the interest level. I am willing to
continue
>>>>> working
>>>>> on this as an open source project and bring it to the right
shape so it
>>>>> can
>>>>> be merged with the LLVM tree.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Chris
>>>>>
>>>>> P.S. I intent to join the llvm social in Bay Area tonight
and I will be
>>>>> more than happy to talk about it.
>>>>> -------------- next part --------------
>>>>> An HTML attachment was scrubbed...
>>>>> URL:
>>>>>
>>>>>
<http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20150604/289e4438/attachment-0001.html>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Samuel Antão

2015-Jun-08 22:07 UTC

head link

[LLVMdev] Supporting heterogeneous computing in llvm.

Hi Roel, Chris,

This is a summary on how you can add support for a a different offloading
device on top of what we have in github for OpenMP:

a) Download and install lvm (https://github.com/clang-omp/llvm_trunk), and
clang (https://github.com/clang-omp/clang_trunk) as usual

b) install the official llvm OpenMP runtime library openmp.llvm.org. Clang
will expect that to be present in your library path in order to compile
OpenMP code (even if you do not need any OpenMP feature other than
offloading).

c) Install https://github.com/clang-omp/libomptarget (running ‘make' should
do it). This library implements the API to control offloading. It also
contains a set of plugins to some targets we are testing this with -
x86_64, powerpc64 and NVPTX - in ./RTLs. You will need to implement a plug
in for your target as well. The interface used for these plugins is
detailed in the document proposed in
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084986.html .You can
look at the existing plugins for a hint. In a nutshell you would have to
implement code that allocates and moves data to your device, returns a
table of entry points and global variables given a device library and
launches execution of a given entry point with the provided list of
arguments.

d) The current implementation is expecting the device library to use ELF
format. There is no reason for that other than the platforms we tested this
with so far use ELF format. If your device does not use
ELF __tgt_register_lib() (src/omptarget.cpp) would have to be extended to
understand your desired format. Otherwise you may just update
src/targets_info.cpp with your ELF ID and plugin name.

e) Offloading is driven by clang, so it has to be aware of the required by
yourr device. If your device toolchain is not implemented in clang you
would have to do that in lib/Driver/ToolChains.cpp.

f) Once everything is in place, you can compile your code by running
something like “clang -fopenmp -omptargets=your-target-triple app.c”. If
you do separate compilation you could see that two different files are
generated for a given source file (the target file has the suffix
tgt-your-target-triple).

I should say that in general OpenMP requires a runtime library for the
device as well, however if you do not use any OpenMP pragmas inside your
target code you won’t need that.

We started porting our code related with offloading currently in github to
clang upstream. The driver support is currently under review in
http://reviews.llvm.org/D9888. We are about to send our first offloading
codegen patches as well.

I understand that what Chris is proposing is somewhat different that what
we have in place, given that the transformations are intended to be in LLVM
IR. However, the goal seems to be the same. Hope the summary above gives
you some hints on whether your use cases can be accommodated.

Feel free to ask any questions you may have.

Thanks!

Samuel


2015-06-08 16:46 GMT-04:00 Sergey Ostanevich <sergos.gnu at gmail.com>:
> Roel,
>
> You have to checkout and build llvm/clang as usual.
> For runtime support you'll have to build the libomptarget and make a
> plugin for your target. Samuel can help you some more.
> As for the OpenMP examples I can recommend you the
> http://openmp.org/mp-documents/OpenMP4.0.0.Examples.pdf
> look into the target constructs.
>
> Sergos
>
>
> On Mon, Jun 8, 2015 at 6:13 PM, Roel Jordans <r.jordans at tue.nl>
wrote:
> > Hi Sergos,
> >
> > I'd like to try this on our hardware.  Is there some example code
that I
> > could use to get started?
> >
> > Cheers,
> >  Roel
> >
> >
> > On 08/06/15 13:27, Sergey Ostanevich wrote:
> >>
> >> Chirs,
> >>
> >> Have you seen an offloading infrastructure design proposal at
> >> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084986.html
?
> >> It relies on the long-standing OpenMP standard with recent updates
to
> >> support the heterogenous computations.
> >> Could you please review it and comment on how it fits to your
needs?
> >>
> >> It's not quite clear from your proposal what source language
standard
> >> do you plat to support - you just metion that OpenCL will be one
of
> >> your backends, as far as I got it. What's your plan on sources
-
> >> C/C++/FORTRAN?
> >> How would you control the offloading, data transfer, scheduling
and so
> >> on? Whether it will be new language constructs, similar to
prallel_for
> >> in Cilk Plus, or will it be pragma-based like in OpenMP or
OpenACC?
> >>
> >> The design I mentioned above has an operable implementation fon
NVIDIA
> >> target at the
> >>
> >> https://github.com/clang-omp/llvm_trunk
> >> https://github.com/clang-omp/clang_trunk
> >>
> >> with runtime implemented at
> >>
> >> https://github.com/clang-omp/libomptarget
> >>
> >> you're welcome to try it out, if you have an appropriate
device.
> >>
> >> Regards,
> >> Sergos
> >>
> >> On Sat, Jun 6, 2015 at 2:24 PM, Christos Margiolas
> >> <chrmargiolas at gmail.com> wrote:
> >>>
> >>> Hello,
> >>>
> >>> Thank you a lot for the feedback. I believe that the
heterogeneous
> engine
> >>> should be strongly connected with parallelization and
vectorization
> >>> efforts.
> >>> Most of the accelerators are parallel architectures where
having
> >>> efficient
> >>> parallelization and vectorization can be critical for
performance.
> >>>
> >>> I am interested in these efforts and I hope that my code can
help you
> >>> managing the offloading operations. Your LLVM instruction set
> extensions
> >>> may
> >>> require some changes in the analysis code but I think is going
to be
> >>> straightforward.
> >>>
> >>> I am planning to push my code on phabricator in the next days.
> >>>
> >>> thanks,
> >>> Chris
> >>>
> >>>
> >>> On Fri, Jun 5, 2015 at 3:45 AM, Adve, Vikram Sadanand
> >>> <vadve at illinois.edu>
> >>> wrote:
> >>>>
> >>>>
> >>>> Christos,
> >>>>
> >>>> We would be very interested in learning more about this.
> >>>>
> >>>> In my group, we (Prakalp Srivastava, Maria Kotsifakou and
I) have been
> >>>> working on LLVM extensions to make it easier to target a
wide range of
> >>>> accelerators in a heterogeneous mobile device, such as
Qualcomm's
> >>>> Snapdragon
> >>>> and other APUs.  Our approach has been to (a) add better
abstractions
> of
> >>>> parallelism to the LLVM instruction set that can be mapped
down to a
> >>>> wide
> >>>> range of parallel hardware accelerators; and (b) to
develop optimizing
> >>>> "back-end" translators to generate efficient
code for the accelerators
> >>>> from
> >>>> the extended IR.
> >>>>
> >>>> So far, we have been targeting GPUs and vector hardware,
but
> semi-custom
> >>>> (programmable) accelerators are our next goal.  We have
discussed DSPs
> >>>> as a
> >>>> valuable potential goal as well.
> >>>>
> >>>> Judging from the brief information here, I'm guessing
that our
> projects
> >>>> have been quite complementary.  We have not worked on the
extraction
> >>>> passes,
> >>>> scheduling, or other run-time components you mention and
would be
> happy
> >>>> to
> >>>> use an existing solution for those.  Our hope is that the
IR
> extensions
> >>>> and
> >>>> translators will give your schedulers greater flexibility
to retarget
> >>>> the
> >>>> extracted code components to different accelerators.
> >>>>
> >>>> --Vikram S. Adve
> >>>> Visiting Professor, School of Computer and Communication
Sciences,
> EPFL
> >>>> Professor, Department of Computer Science
> >>>> University of Illinois at Urbana-Champaign
> >>>> vadve at illinois.edu
> >>>> http://llvm.org
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Jun 5, 2015, at 3:18 AM, llvmdev-request at cs.uiuc.edu
wrote:
> >>>>
> >>>>> Date: Thu, 4 Jun 2015 17:35:25 -0700
> >>>>> From: Christos Margiolas <chrmargiolas at
gmail.com>
> >>>>> To: LLVM Developers Mailing List <llvmdev at
cs.uiuc.edu>
> >>>>> Subject: [LLVMdev] Supporting heterogeneous computing
in llvm.
> >>>>> Message-ID:
> >>>>>
> >>>>>
<CAC3KUCx0mpBrnrGjDVxQzxtBpnJXtw3herZ_E2pQoSqSyMNsKA at mail.gmail.com>
> >>>>> Content-Type: text/plain; charset="utf-8"
> >>>>>
> >>>>> Hello All,
> >>>>>
> >>>>> The last two months I have been working on the design
and
> >>>>> implementation
> >>>>> of
> >>>>> a heterogeneous execution engine for LLVM. I started
this project as
> an
> >>>>> intern at the Qualcomm Innovation Center and I believe
it can be
> useful
> >>>>> to
> >>>>> different people and use cases. I am planning to share
more details
> and
> >>>>> a
> >>>>> set of patches in the next
> >>>>> days. However, I would first like to see if there is
an interest for
> >>>>> this.
> >>>>>
> >>>>> The project is about providing compiler and runtime
support for the
> >>>>> automatic and transparent offloading of loop or
function workloads to
> >>>>> accelerators.
> >>>>>
> >>>>> It is composed of the following:
> >>>>> a) Compiler and Transformation Passes for extracting
loops or
> functions
> >>>>> for
> >>>>> offloading.
> >>>>> b) A runtime library that handles scheduling, data
sharing and
> >>>>> coherency
> >>>>> between the
> >>>>> host and accelerator sides.
> >>>>> c) A modular codebase and design. Adaptors specialize
the code
> >>>>> transformations for the target accelerators. Runtime
plugins manage
> the
> >>>>> interaction with the different accelerator
environments.
> >>>>>
> >>>>> So far, this work so far supports the Qualcomm DSP
accelerator  but I
> >>>>> am
> >>>>> planning to extend it to support OpenCL accelerators.
I have also
> >>>>> developed
> >>>>> a debug port where I can test the passes and the
runtime without
> >>>>> requiring
> >>>>> an accelerator.
> >>>>>
> >>>>>
> >>>>> The project is still in early R&D stage and I am
looking forward for
> >>>>> feedback and to gauge  the interest level. I am
willing to continue
> >>>>> working
> >>>>> on this as an open source project and bring it to the
right shape so
> it
> >>>>> can
> >>>>> be merged with the LLVM tree.
> >>>>>
> >>>>>
> >>>>> Regards,
> >>>>> Chris
> >>>>>
> >>>>> P.S. I intent to join the llvm social in Bay Area
tonight and I will
> be
> >>>>> more than happy to talk about it.
> >>>>> -------------- next part --------------
> >>>>> An HTML attachment was scrubbed...
> >>>>> URL:
> >>>>>
> >>>>> <
>
http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20150604/289e4438/attachment-0001.html
> >
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> LLVM Developers mailing list
> >>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150608/f11e57fa/attachment.html>

Roel Jordans

2015-Jun-09 13:32 UTC

head link

[LLVMdev] Supporting heterogeneous computing in llvm.

Hi Sergos and Samuel,

Thanks for the links, I've got it mostly working now.

I still have a problem with linking the code.  It seems that the clang 
driver doesn't pass its library search path to nvlink when linking the 
generated cuda code to the target library, resulting in it not correctly 
finding libtarget-nvptx.a.  Is there some flag or environment variable 
that I should set here?  Manually providing nvlink with a -L flag 
pointing to the appropriate path seems to work for the linking step.

Cheers,
  Roel

On 09/06/15 00:07, Samuel Antão wrote:> Hi Roel, Chris,
>
> This is a summary on how you can add support for a a different
> offloading device on top of what we have in github for OpenMP:
>
> a) Download and install lvm (https://github.com/clang-omp/llvm_trunk),
> and clang (https://github.com/clang-omp/clang_trunk) as usual
>
> b) install the official llvm OpenMP runtime library openmp.llvm.org
> <http://openmp.llvm.org>. Clang will expect that to be present in
your
> library path in order to compile OpenMP code (even if you do not need
> any OpenMP feature other than offloading).
>
> c) Install https://github.com/clang-omp/libomptarget (running ‘make'
> should do it). This library implements the API to control offloading. It
> also contains a set of plugins to some targets we are testing this with
> - x86_64, powerpc64 and NVPTX - in ./RTLs. You will need to implement a
> plug in for your target as well. The interface used for these plugins is
> detailed in the document proposed in
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084986.html .You
> can look at the existing plugins for a hint. In a nutshell you would
> have to implement code that allocates and moves data to your device,
> returns a table of entry points and global variables given a device
> library and launches execution of a given entry point with the provided
> list of arguments.
>
> d) The current implementation is expecting the device library to use ELF
> format. There is no reason for that other than the platforms we tested
> this with so far use ELF format. If your device does not use
> ELF __tgt_register_lib() (src/omptarget.cpp) would have to be extended
> to understand your desired format. Otherwise you may just update
> src/targets_info.cpp with your ELF ID and plugin name.
>
> e) Offloading is driven by clang, so it has to be aware of the required
> by yourr device. If your device toolchain is not implemented in clang
> you would have to do that in lib/Driver/ToolChains.cpp.
>
> f) Once everything is in place, you can compile your code by running
> something like “clang -fopenmp -omptargets=your-target-triple app.c”. If
> you do separate compilation you could see that two different files are
> generated for a given source file (the target file has the suffix
> tgt-your-target-triple).
>
> I should say that in general OpenMP requires a runtime library for the
> device as well, however if you do not use any OpenMP pragmas inside your
> target code you won’t need that.
>
> We started porting our code related with offloading currently in github
> to clang upstream. The driver support is currently under review in
> http://reviews.llvm.org/D9888. We are about to send our first offloading
> codegen patches as well.
>
> I understand that what Chris is proposing is somewhat different that
> what we have in place, given that the transformations are intended to be
> in LLVM IR. However, the goal seems to be the same. Hope the summary
> above gives you some hints on whether your use cases can be accommodated.
>
> Feel free to ask any questions you may have.
>
> Thanks!
>
> Samuel
>
>
>
> 2015-06-08 16:46 GMT-04:00 Sergey Ostanevich <sergos.gnu at gmail.com
> <mailto:sergos.gnu at gmail.com>>:
>
>     Roel,
>
>     You have to checkout and build llvm/clang as usual.
>     For runtime support you'll have to build the libomptarget and make
a
>     plugin for your target. Samuel can help you some more.
>     As for the OpenMP examples I can recommend you the
>     http://openmp.org/mp-documents/OpenMP4.0.0.Examples.pdf
>     look into the target constructs.
>
>     Sergos
>
>
>     On Mon, Jun 8, 2015 at 6:13 PM, Roel Jordans <r.jordans at tue.nl
>     <mailto:r.jordans at tue.nl>> wrote:
>      > Hi Sergos,
>      >
>      > I'd like to try this on our hardware.  Is there some example
code
>     that I
>      > could use to get started?
>      >
>      > Cheers,
>      >  Roel
>      >
>      >
>      > On 08/06/15 13:27, Sergey Ostanevich wrote:
>      >>
>      >> Chirs,
>      >>
>      >> Have you seen an offloading infrastructure design proposal at
>      >>
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084986.html ?
>      >> It relies on the long-standing OpenMP standard with recent
>     updates to
>      >> support the heterogenous computations.
>      >> Could you please review it and comment on how it fits to your
needs?
>      >>
>      >> It's not quite clear from your proposal what source
language
>     standard
>      >> do you plat to support - you just metion that OpenCL will be
one of
>      >> your backends, as far as I got it. What's your plan on
sources -
>      >> C/C++/FORTRAN?
>      >> How would you control the offloading, data transfer,
scheduling
>     and so
>      >> on? Whether it will be new language constructs, similar to
>     prallel_for
>      >> in Cilk Plus, or will it be pragma-based like in OpenMP or
OpenACC?
>      >>
>      >> The design I mentioned above has an operable implementation
fon
>     NVIDIA
>      >> target at the
>      >>
>      >> https://github.com/clang-omp/llvm_trunk
>      >> https://github.com/clang-omp/clang_trunk
>      >>
>      >> with runtime implemented at
>      >>
>      >> https://github.com/clang-omp/libomptarget
>      >>
>      >> you're welcome to try it out, if you have an appropriate
device.
>      >>
>      >> Regards,
>      >> Sergos
>      >>
>      >> On Sat, Jun 6, 2015 at 2:24 PM, Christos Margiolas
>      >> <chrmargiolas at gmail.com <mailto:chrmargiolas at
gmail.com>> wrote:
>      >>>
>      >>> Hello,
>      >>>
>      >>> Thank you a lot for the feedback. I believe that the
>     heterogeneous engine
>      >>> should be strongly connected with parallelization and
vectorization
>      >>> efforts.
>      >>> Most of the accelerators are parallel architectures where
having
>      >>> efficient
>      >>> parallelization and vectorization can be critical for
performance.
>      >>>
>      >>> I am interested in these efforts and I hope that my code
can
>     help you
>      >>> managing the offloading operations. Your LLVM instruction
set
>     extensions
>      >>> may
>      >>> require some changes in the analysis code but I think is
going
>     to be
>      >>> straightforward.
>      >>>
>      >>> I am planning to push my code on phabricator in the next
days.
>      >>>
>      >>> thanks,
>      >>> Chris
>      >>>
>      >>>
>      >>> On Fri, Jun 5, 2015 at 3:45 AM, Adve, Vikram Sadanand
>      >>> <vadve at illinois.edu <mailto:vadve at
illinois.edu>>
>      >>> wrote:
>      >>>>
>      >>>>
>      >>>> Christos,
>      >>>>
>      >>>> We would be very interested in learning more about
this.
>      >>>>
>      >>>> In my group, we (Prakalp Srivastava, Maria Kotsifakou
and I)
>     have been
>      >>>> working on LLVM extensions to make it easier to
target a wide
>     range of
>      >>>> accelerators in a heterogeneous mobile device, such
as Qualcomm's
>      >>>> Snapdragon
>      >>>> and other APUs.  Our approach has been to (a) add
better
>     abstractions of
>      >>>> parallelism to the LLVM instruction set that can be
mapped
>     down to a
>      >>>> wide
>      >>>> range of parallel hardware accelerators; and (b) to
develop
>     optimizing
>      >>>> "back-end" translators to generate
efficient code for the
>     accelerators
>      >>>> from
>      >>>> the extended IR.
>      >>>>
>      >>>> So far, we have been targeting GPUs and vector
hardware, but
>     semi-custom
>      >>>> (programmable) accelerators are our next goal.  We
have
>     discussed DSPs
>      >>>> as a
>      >>>> valuable potential goal as well.
>      >>>>
>      >>>> Judging from the brief information here, I'm
guessing that our
>     projects
>      >>>> have been quite complementary.  We have not worked on
the
>     extraction
>      >>>> passes,
>      >>>> scheduling, or other run-time components you mention
and would
>     be happy
>      >>>> to
>      >>>> use an existing solution for those.  Our hope is that
the IR
>     extensions
>      >>>> and
>      >>>> translators will give your schedulers greater
flexibility to
>     retarget
>      >>>> the
>      >>>> extracted code components to different accelerators.
>      >>>>
>      >>>> --Vikram S. Adve
>      >>>> Visiting Professor, School of Computer and
Communication
>     Sciences, EPFL
>      >>>> Professor, Department of Computer Science
>      >>>> University of Illinois at Urbana-Champaign
>      >>>> vadve at illinois.edu <mailto:vadve at
illinois.edu>
>      >>>> http://llvm.org
>      >>>>
>      >>>>
>      >>>>
>      >>>>
>      >>>> On Jun 5, 2015, at 3:18 AM, llvmdev-request at
cs.uiuc.edu
>     <mailto:llvmdev-request at cs.uiuc.edu> wrote:
>      >>>>
>      >>>>> Date: Thu, 4 Jun 2015 17:35:25 -0700
>      >>>>> From: Christos Margiolas <chrmargiolas at
gmail.com
>     <mailto:chrmargiolas at gmail.com>>
>      >>>>> To: LLVM Developers Mailing List <llvmdev at
cs.uiuc.edu
>     <mailto:llvmdev at cs.uiuc.edu>>
>      >>>>> Subject: [LLVMdev] Supporting heterogeneous
computing in llvm.
>      >>>>> Message-ID:
>      >>>>>
>      >>>>>
>     <CAC3KUCx0mpBrnrGjDVxQzxtBpnJXtw3herZ_E2pQoSqSyMNsKA at
mail.gmail.com
>     <mailto:CAC3KUCx0mpBrnrGjDVxQzxtBpnJXtw3herZ_E2pQoSqSyMNsKA at
mail.gmail.com>>
>      >>>>> Content-Type: text/plain;
charset="utf-8"
>      >>>>>
>      >>>>> Hello All,
>      >>>>>
>      >>>>> The last two months I have been working on the
design and
>      >>>>> implementation
>      >>>>> of
>      >>>>> a heterogeneous execution engine for LLVM. I
started this
>     project as an
>      >>>>> intern at the Qualcomm Innovation Center and I
believe it can
>     be useful
>      >>>>> to
>      >>>>> different people and use cases. I am planning to
share more
>     details and
>      >>>>> a
>      >>>>> set of patches in the next
>      >>>>> days. However, I would first like to see if there
is an
>     interest for
>      >>>>> this.
>      >>>>>
>      >>>>> The project is about providing compiler and
runtime support
>     for the
>      >>>>> automatic and transparent offloading of loop or
function
>     workloads to
>      >>>>> accelerators.
>      >>>>>
>      >>>>> It is composed of the following:
>      >>>>> a) Compiler and Transformation Passes for
extracting loops or
>     functions
>      >>>>> for
>      >>>>> offloading.
>      >>>>> b) A runtime library that handles scheduling,
data sharing and
>      >>>>> coherency
>      >>>>> between the
>      >>>>> host and accelerator sides.
>      >>>>> c) A modular codebase and design. Adaptors
specialize the code
>      >>>>> transformations for the target accelerators.
Runtime plugins
>     manage the
>      >>>>> interaction with the different accelerator
environments.
>      >>>>>
>      >>>>> So far, this work so far supports the Qualcomm
DSP
>     accelerator  but I
>      >>>>> am
>      >>>>> planning to extend it to support OpenCL
accelerators. I have also
>      >>>>> developed
>      >>>>> a debug port where I can test the passes and the
runtime without
>      >>>>> requiring
>      >>>>> an accelerator.
>      >>>>>
>      >>>>>
>      >>>>> The project is still in early R&D stage and I
am looking
>     forward for
>      >>>>> feedback and to gauge  the interest level. I am
willing to
>     continue
>      >>>>> working
>      >>>>> on this as an open source project and bring it to
the right
>     shape so it
>      >>>>> can
>      >>>>> be merged with the LLVM tree.
>      >>>>>
>      >>>>>
>      >>>>> Regards,
>      >>>>> Chris
>      >>>>>
>      >>>>> P.S. I intent to join the llvm social in Bay Area
tonight and
>     I will be
>      >>>>> more than happy to talk about it.
>      >>>>> -------------- next part --------------
>      >>>>> An HTML attachment was scrubbed...
>      >>>>> URL:
>      >>>>>
>      >>>>>
>    
<http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20150604/289e4438/attachment-0001.html>
>      >>>>
>      >>>>
>      >>>>
>      >>>> _______________________________________________
>      >>>> LLVM Developers mailing list
>      >>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at
cs.uiuc.edu>
>     http://llvm.cs.uiuc.edu
>      >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>      >>>
>      >>>
>      >>>
>      >>>
>      >>> _______________________________________________
>      >>> LLVM Developers mailing list
>      >>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at
cs.uiuc.edu>
>     http://llvm.cs.uiuc.edu
>      >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>      >>>
>      >> _______________________________________________
>      >> LLVM Developers mailing list
>      >> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>     http://llvm.cs.uiuc.edu
>      >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>      >>
>      > _______________________________________________
>      > LLVM Developers mailing list
>      > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>     http://llvm.cs.uiuc.edu
>      > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

llvm dev - Jun 2015 - [LLVMdev] Supporting heterogeneous computing in llvm.

[LLVMdev] Supporting heterogeneous computing in llvm.

[LLVMdev] Supporting heterogeneous computing in llvm.

[LLVMdev] Supporting heterogeneous computing in llvm.