thr3ads.net - llvm dev - [LLVMdev] Supporting heterogeneous computing in llvm. [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Sergey Ostanevich

2015-Jun-08 11:27 UTC

[LLVMdev] Supporting heterogeneous computing in llvm.

Chirs,

Have you seen an offloading infrastructure design proposal at
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084986.html ?
It relies on the long-standing OpenMP standard with recent updates to
support the heterogenous computations.
Could you please review it and comment on how it fits to your needs?

It's not quite clear from your proposal what source language standard
do you plat to support - you just metion that OpenCL will be one of
your backends, as far as I got it. What's your plan on sources -
C/C++/FORTRAN?
How would you control the offloading, data transfer, scheduling and so
on? Whether it will be new language constructs, similar to prallel_for
in Cilk Plus, or will it be pragma-based like in OpenMP or OpenACC?

The design I mentioned above has an operable implementation fon NVIDIA
target at the

https://github.com/clang-omp/llvm_trunk
https://github.com/clang-omp/clang_trunk

with runtime implemented at

https://github.com/clang-omp/libomptarget

you're welcome to try it out, if you have an appropriate device.

Regards,
Sergos

On Sat, Jun 6, 2015 at 2:24 PM, Christos Margiolas
<chrmargiolas at gmail.com> wrote:> Hello,
>
> Thank you a lot for the feedback. I believe that the heterogeneous engine
> should be strongly connected with parallelization and vectorization
efforts.
> Most of the accelerators are parallel architectures where having efficient
> parallelization and vectorization can be critical for performance.
>
> I am interested in these efforts and I hope that my code can help you
> managing the offloading operations. Your LLVM instruction set extensions
may
> require some changes in the analysis code but I think is going to be
> straightforward.
>
> I am planning to push my code on phabricator in the next days.
>
> thanks,
> Chris
>
>
> On Fri, Jun 5, 2015 at 3:45 AM, Adve, Vikram Sadanand <vadve at
illinois.edu>
> wrote:
>>
>> Christos,
>>
>> We would be very interested in learning more about this.
>>
>> In my group, we (Prakalp Srivastava, Maria Kotsifakou and I) have been
>> working on LLVM extensions to make it easier to target a wide range of
>> accelerators in a heterogeneous mobile device, such as Qualcomm's
Snapdragon
>> and other APUs.  Our approach has been to (a) add better abstractions
of
>> parallelism to the LLVM instruction set that can be mapped down to a
wide
>> range of parallel hardware accelerators; and (b) to develop optimizing
>> "back-end" translators to generate efficient code for the
accelerators from
>> the extended IR.
>>
>> So far, we have been targeting GPUs and vector hardware, but
semi-custom
>> (programmable) accelerators are our next goal.  We have discussed DSPs
as a
>> valuable potential goal as well.
>>
>> Judging from the brief information here, I'm guessing that our
projects
>> have been quite complementary.  We have not worked on the extraction
passes,
>> scheduling, or other run-time components you mention and would be happy
to
>> use an existing solution for those.  Our hope is that the IR extensions
and
>> translators will give your schedulers greater flexibility to retarget
the
>> extracted code components to different accelerators.
>>
>> --Vikram S. Adve
>> Visiting Professor, School of Computer and Communication Sciences, EPFL
>> Professor, Department of Computer Science
>> University of Illinois at Urbana-Champaign
>> vadve at illinois.edu
>> http://llvm.org
>>
>>
>>
>>
>> On Jun 5, 2015, at 3:18 AM, llvmdev-request at cs.uiuc.edu wrote:
>>
>> > Date: Thu, 4 Jun 2015 17:35:25 -0700
>> > From: Christos Margiolas <chrmargiolas at gmail.com>
>> > To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
>> > Subject: [LLVMdev] Supporting heterogeneous computing in llvm.
>> > Message-ID:
>> >
>> > <CAC3KUCx0mpBrnrGjDVxQzxtBpnJXtw3herZ_E2pQoSqSyMNsKA at
mail.gmail.com>
>> > Content-Type: text/plain; charset="utf-8"
>> >
>> > Hello All,
>> >
>> > The last two months I have been working on the design and
implementation
>> > of
>> > a heterogeneous execution engine for LLVM. I started this project
as an
>> > intern at the Qualcomm Innovation Center and I believe it can be
useful
>> > to
>> > different people and use cases. I am planning to share more
details and
>> > a
>> > set of patches in the next
>> > days. However, I would first like to see if there is an interest
for
>> > this.
>> >
>> > The project is about providing compiler and runtime support for
the
>> > automatic and transparent offloading of loop or function workloads
to
>> > accelerators.
>> >
>> > It is composed of the following:
>> > a) Compiler and Transformation Passes for extracting loops or
functions
>> > for
>> > offloading.
>> > b) A runtime library that handles scheduling, data sharing and
coherency
>> > between the
>> > host and accelerator sides.
>> > c) A modular codebase and design. Adaptors specialize the code
>> > transformations for the target accelerators. Runtime plugins
manage the
>> > interaction with the different accelerator environments.
>> >
>> > So far, this work so far supports the Qualcomm DSP accelerator 
but I am
>> > planning to extend it to support OpenCL accelerators. I have also
>> > developed
>> > a debug port where I can test the passes and the runtime without
>> > requiring
>> > an accelerator.
>> >
>> >
>> > The project is still in early R&D stage and I am looking
forward for
>> > feedback and to gauge  the interest level. I am willing to
continue
>> > working
>> > on this as an open source project and bring it to the right shape
so it
>> > can
>> > be merged with the LLVM tree.
>> >
>> >
>> > Regards,
>> > Chris
>> >
>> > P.S. I intent to join the llvm social in Bay Area tonight and I
will be
>> > more than happy to talk about it.
>> > -------------- next part --------------
>> > An HTML attachment was scrubbed...
>> > URL:
>> >
<http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20150604/289e4438/attachment-0001.html>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Roel Jordans

2015-Jun-08 15:13 UTC

head link

[LLVMdev] Supporting heterogeneous computing in llvm.

Hi Sergos,

I'd like to try this on our hardware.  Is there some example code that I 
could use to get started?

Cheers,
  Roel

On 08/06/15 13:27, Sergey Ostanevich wrote:> Chirs,
>
> Have you seen an offloading infrastructure design proposal at
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084986.html ?
> It relies on the long-standing OpenMP standard with recent updates to
> support the heterogenous computations.
> Could you please review it and comment on how it fits to your needs?
>
> It's not quite clear from your proposal what source language standard
> do you plat to support - you just metion that OpenCL will be one of
> your backends, as far as I got it. What's your plan on sources -
> C/C++/FORTRAN?
> How would you control the offloading, data transfer, scheduling and so
> on? Whether it will be new language constructs, similar to prallel_for
> in Cilk Plus, or will it be pragma-based like in OpenMP or OpenACC?
>
> The design I mentioned above has an operable implementation fon NVIDIA
> target at the
>
> https://github.com/clang-omp/llvm_trunk
> https://github.com/clang-omp/clang_trunk
>
> with runtime implemented at
>
> https://github.com/clang-omp/libomptarget
>
> you're welcome to try it out, if you have an appropriate device.
>
> Regards,
> Sergos
>
> On Sat, Jun 6, 2015 at 2:24 PM, Christos Margiolas
> <chrmargiolas at gmail.com> wrote:
>> Hello,
>>
>> Thank you a lot for the feedback. I believe that the heterogeneous
engine
>> should be strongly connected with parallelization and vectorization
efforts.
>> Most of the accelerators are parallel architectures where having
efficient
>> parallelization and vectorization can be critical for performance.
>>
>> I am interested in these efforts and I hope that my code can help you
>> managing the offloading operations. Your LLVM instruction set
extensions may
>> require some changes in the analysis code but I think is going to be
>> straightforward.
>>
>> I am planning to push my code on phabricator in the next days.
>>
>> thanks,
>> Chris
>>
>>
>> On Fri, Jun 5, 2015 at 3:45 AM, Adve, Vikram Sadanand <vadve at
illinois.edu>
>> wrote:
>>>
>>> Christos,
>>>
>>> We would be very interested in learning more about this.
>>>
>>> In my group, we (Prakalp Srivastava, Maria Kotsifakou and I) have
been
>>> working on LLVM extensions to make it easier to target a wide range
of
>>> accelerators in a heterogeneous mobile device, such as
Qualcomm's Snapdragon
>>> and other APUs.  Our approach has been to (a) add better
abstractions of
>>> parallelism to the LLVM instruction set that can be mapped down to
a wide
>>> range of parallel hardware accelerators; and (b) to develop
optimizing
>>> "back-end" translators to generate efficient code for the
accelerators from
>>> the extended IR.
>>>
>>> So far, we have been targeting GPUs and vector hardware, but
semi-custom
>>> (programmable) accelerators are our next goal.  We have discussed
DSPs as a
>>> valuable potential goal as well.
>>>
>>> Judging from the brief information here, I'm guessing that our
projects
>>> have been quite complementary.  We have not worked on the
extraction passes,
>>> scheduling, or other run-time components you mention and would be
happy to
>>> use an existing solution for those.  Our hope is that the IR
extensions and
>>> translators will give your schedulers greater flexibility to
retarget the
>>> extracted code components to different accelerators.
>>>
>>> --Vikram S. Adve
>>> Visiting Professor, School of Computer and Communication Sciences,
EPFL
>>> Professor, Department of Computer Science
>>> University of Illinois at Urbana-Champaign
>>> vadve at illinois.edu
>>> http://llvm.org
>>>
>>>
>>>
>>>
>>> On Jun 5, 2015, at 3:18 AM, llvmdev-request at cs.uiuc.edu wrote:
>>>
>>>> Date: Thu, 4 Jun 2015 17:35:25 -0700
>>>> From: Christos Margiolas <chrmargiolas at gmail.com>
>>>> To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
>>>> Subject: [LLVMdev] Supporting heterogeneous computing in llvm.
>>>> Message-ID:
>>>>
>>>> <CAC3KUCx0mpBrnrGjDVxQzxtBpnJXtw3herZ_E2pQoSqSyMNsKA at
mail.gmail.com>
>>>> Content-Type: text/plain; charset="utf-8"
>>>>
>>>> Hello All,
>>>>
>>>> The last two months I have been working on the design and
implementation
>>>> of
>>>> a heterogeneous execution engine for LLVM. I started this
project as an
>>>> intern at the Qualcomm Innovation Center and I believe it can
be useful
>>>> to
>>>> different people and use cases. I am planning to share more
details and
>>>> a
>>>> set of patches in the next
>>>> days. However, I would first like to see if there is an
interest for
>>>> this.
>>>>
>>>> The project is about providing compiler and runtime support for
the
>>>> automatic and transparent offloading of loop or function
workloads to
>>>> accelerators.
>>>>
>>>> It is composed of the following:
>>>> a) Compiler and Transformation Passes for extracting loops or
functions
>>>> for
>>>> offloading.
>>>> b) A runtime library that handles scheduling, data sharing and
coherency
>>>> between the
>>>> host and accelerator sides.
>>>> c) A modular codebase and design. Adaptors specialize the code
>>>> transformations for the target accelerators. Runtime plugins
manage the
>>>> interaction with the different accelerator environments.
>>>>
>>>> So far, this work so far supports the Qualcomm DSP accelerator 
but I am
>>>> planning to extend it to support OpenCL accelerators. I have
also
>>>> developed
>>>> a debug port where I can test the passes and the runtime
without
>>>> requiring
>>>> an accelerator.
>>>>
>>>>
>>>> The project is still in early R&D stage and I am looking
forward for
>>>> feedback and to gauge  the interest level. I am willing to
continue
>>>> working
>>>> on this as an open source project and bring it to the right
shape so it
>>>> can
>>>> be merged with the LLVM tree.
>>>>
>>>>
>>>> Regards,
>>>> Chris
>>>>
>>>> P.S. I intent to join the llvm social in Bay Area tonight and I
will be
>>>> more than happy to talk about it.
>>>> -------------- next part --------------
>>>> An HTML attachment was scrubbed...
>>>> URL:
>>>>
<http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20150604/289e4438/attachment-0001.html>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Sergey Ostanevich

2015-Jun-08 20:46 UTC

head link

[LLVMdev] Supporting heterogeneous computing in llvm.

Roel,

You have to checkout and build llvm/clang as usual.
For runtime support you'll have to build the libomptarget and make a
plugin for your target. Samuel can help you some more.
As for the OpenMP examples I can recommend you the
http://openmp.org/mp-documents/OpenMP4.0.0.Examples.pdf
look into the target constructs.

Sergos


On Mon, Jun 8, 2015 at 6:13 PM, Roel Jordans <r.jordans at tue.nl>
wrote:> Hi Sergos,
>
> I'd like to try this on our hardware.  Is there some example code that
I
> could use to get started?
>
> Cheers,
>  Roel
>
>
> On 08/06/15 13:27, Sergey Ostanevich wrote:
>>
>> Chirs,
>>
>> Have you seen an offloading infrastructure design proposal at
>> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084986.html ?
>> It relies on the long-standing OpenMP standard with recent updates to
>> support the heterogenous computations.
>> Could you please review it and comment on how it fits to your needs?
>>
>> It's not quite clear from your proposal what source language
standard
>> do you plat to support - you just metion that OpenCL will be one of
>> your backends, as far as I got it. What's your plan on sources -
>> C/C++/FORTRAN?
>> How would you control the offloading, data transfer, scheduling and so
>> on? Whether it will be new language constructs, similar to prallel_for
>> in Cilk Plus, or will it be pragma-based like in OpenMP or OpenACC?
>>
>> The design I mentioned above has an operable implementation fon NVIDIA
>> target at the
>>
>> https://github.com/clang-omp/llvm_trunk
>> https://github.com/clang-omp/clang_trunk
>>
>> with runtime implemented at
>>
>> https://github.com/clang-omp/libomptarget
>>
>> you're welcome to try it out, if you have an appropriate device.
>>
>> Regards,
>> Sergos
>>
>> On Sat, Jun 6, 2015 at 2:24 PM, Christos Margiolas
>> <chrmargiolas at gmail.com> wrote:
>>>
>>> Hello,
>>>
>>> Thank you a lot for the feedback. I believe that the heterogeneous
engine
>>> should be strongly connected with parallelization and vectorization
>>> efforts.
>>> Most of the accelerators are parallel architectures where having
>>> efficient
>>> parallelization and vectorization can be critical for performance.
>>>
>>> I am interested in these efforts and I hope that my code can help
you
>>> managing the offloading operations. Your LLVM instruction set
extensions
>>> may
>>> require some changes in the analysis code but I think is going to
be
>>> straightforward.
>>>
>>> I am planning to push my code on phabricator in the next days.
>>>
>>> thanks,
>>> Chris
>>>
>>>
>>> On Fri, Jun 5, 2015 at 3:45 AM, Adve, Vikram Sadanand
>>> <vadve at illinois.edu>
>>> wrote:
>>>>
>>>>
>>>> Christos,
>>>>
>>>> We would be very interested in learning more about this.
>>>>
>>>> In my group, we (Prakalp Srivastava, Maria Kotsifakou and I)
have been
>>>> working on LLVM extensions to make it easier to target a wide
range of
>>>> accelerators in a heterogeneous mobile device, such as
Qualcomm's
>>>> Snapdragon
>>>> and other APUs.  Our approach has been to (a) add better
abstractions of
>>>> parallelism to the LLVM instruction set that can be mapped down
to a
>>>> wide
>>>> range of parallel hardware accelerators; and (b) to develop
optimizing
>>>> "back-end" translators to generate efficient code for
the accelerators
>>>> from
>>>> the extended IR.
>>>>
>>>> So far, we have been targeting GPUs and vector hardware, but
semi-custom
>>>> (programmable) accelerators are our next goal.  We have
discussed DSPs
>>>> as a
>>>> valuable potential goal as well.
>>>>
>>>> Judging from the brief information here, I'm guessing that
our projects
>>>> have been quite complementary.  We have not worked on the
extraction
>>>> passes,
>>>> scheduling, or other run-time components you mention and would
be happy
>>>> to
>>>> use an existing solution for those.  Our hope is that the IR
extensions
>>>> and
>>>> translators will give your schedulers greater flexibility to
retarget
>>>> the
>>>> extracted code components to different accelerators.
>>>>
>>>> --Vikram S. Adve
>>>> Visiting Professor, School of Computer and Communication
Sciences, EPFL
>>>> Professor, Department of Computer Science
>>>> University of Illinois at Urbana-Champaign
>>>> vadve at illinois.edu
>>>> http://llvm.org
>>>>
>>>>
>>>>
>>>>
>>>> On Jun 5, 2015, at 3:18 AM, llvmdev-request at cs.uiuc.edu
wrote:
>>>>
>>>>> Date: Thu, 4 Jun 2015 17:35:25 -0700
>>>>> From: Christos Margiolas <chrmargiolas at gmail.com>
>>>>> To: LLVM Developers Mailing List <llvmdev at
cs.uiuc.edu>
>>>>> Subject: [LLVMdev] Supporting heterogeneous computing in
llvm.
>>>>> Message-ID:
>>>>>
>>>>> <CAC3KUCx0mpBrnrGjDVxQzxtBpnJXtw3herZ_E2pQoSqSyMNsKA at
mail.gmail.com>
>>>>> Content-Type: text/plain; charset="utf-8"
>>>>>
>>>>> Hello All,
>>>>>
>>>>> The last two months I have been working on the design and
>>>>> implementation
>>>>> of
>>>>> a heterogeneous execution engine for LLVM. I started this
project as an
>>>>> intern at the Qualcomm Innovation Center and I believe it
can be useful
>>>>> to
>>>>> different people and use cases. I am planning to share more
details and
>>>>> a
>>>>> set of patches in the next
>>>>> days. However, I would first like to see if there is an
interest for
>>>>> this.
>>>>>
>>>>> The project is about providing compiler and runtime support
for the
>>>>> automatic and transparent offloading of loop or function
workloads to
>>>>> accelerators.
>>>>>
>>>>> It is composed of the following:
>>>>> a) Compiler and Transformation Passes for extracting loops
or functions
>>>>> for
>>>>> offloading.
>>>>> b) A runtime library that handles scheduling, data sharing
and
>>>>> coherency
>>>>> between the
>>>>> host and accelerator sides.
>>>>> c) A modular codebase and design. Adaptors specialize the
code
>>>>> transformations for the target accelerators. Runtime
plugins manage the
>>>>> interaction with the different accelerator environments.
>>>>>
>>>>> So far, this work so far supports the Qualcomm DSP
accelerator  but I
>>>>> am
>>>>> planning to extend it to support OpenCL accelerators. I
have also
>>>>> developed
>>>>> a debug port where I can test the passes and the runtime
without
>>>>> requiring
>>>>> an accelerator.
>>>>>
>>>>>
>>>>> The project is still in early R&D stage and I am
looking forward for
>>>>> feedback and to gauge  the interest level. I am willing to
continue
>>>>> working
>>>>> on this as an open source project and bring it to the right
shape so it
>>>>> can
>>>>> be merged with the LLVM tree.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Chris
>>>>>
>>>>> P.S. I intent to join the llvm social in Bay Area tonight
and I will be
>>>>> more than happy to talk about it.
>>>>> -------------- next part --------------
>>>>> An HTML attachment was scrubbed...
>>>>> URL:
>>>>>
>>>>>
<http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20150604/289e4438/attachment-0001.html>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Christos Margiolas

2015-Jun-09 07:28 UTC

head link

[LLVMdev] Supporting heterogeneous computing in llvm.

Hello,

I can see some fundamental differences between this work and my work.
However, I think they are more complementary than "competitive". My
work
handles code extraction for offloading (in IR level) and offloading control
with a runtime library this design is portable and not limited to openmp or
another specific annotation scheme. It can handle different types of source
code for offloading e.g. offloading of sequential code, parallel loops or
even OpenCL kernels.

The runtime library is responsible for managing communication, coherency
and scheduling. The library exposes a simple interface and the compiler
generates calls to it. Plugins then provide support for the individual
accelerator types.

The scheme you refer could be supported on the top of my infrastructure. I
personally believe that code extraction and transformations for offloading
should be done in IR and not in source level. The reason is that in IR
level you have enough information about your program (e.g. datatypes) and a
good idea about your target architectures.

--chris

On Mon, Jun 8, 2015 at 4:27 AM, Sergey Ostanevich <sergos.gnu at
gmail.com>
wrote:
> Chirs,
>
> Have you seen an offloading infrastructure design proposal at
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084986.html ?
> It relies on the long-standing OpenMP standard with recent updates to
> support the heterogenous computations.
> Could you please review it and comment on how it fits to your needs?
>
> It's not quite clear from your proposal what source language standard
> do you plat to support - you just metion that OpenCL will be one of
> your backends, as far as I got it. What's your plan on sources -
> C/C++/FORTRAN?
> How would you control the offloading, data transfer, scheduling and so
> on? Whether it will be new language constructs, similar to prallel_for
> in Cilk Plus, or will it be pragma-based like in OpenMP or OpenACC?
>
> The design I mentioned above has an operable implementation fon NVIDIA
> target at the
>
> https://github.com/clang-omp/llvm_trunk
> https://github.com/clang-omp/clang_trunk
>
> with runtime implemented at
>
> https://github.com/clang-omp/libomptarget
>
> you're welcome to try it out, if you have an appropriate device.
>
> Regards,
> Sergos
>
> On Sat, Jun 6, 2015 at 2:24 PM, Christos Margiolas
> <chrmargiolas at gmail.com> wrote:
> > Hello,
> >
> > Thank you a lot for the feedback. I believe that the heterogeneous
engine
> > should be strongly connected with parallelization and vectorization
> efforts.
> > Most of the accelerators are parallel architectures where having
> efficient
> > parallelization and vectorization can be critical for performance.
> >
> > I am interested in these efforts and I hope that my code can help you
> > managing the offloading operations. Your LLVM instruction set
extensions
> may
> > require some changes in the analysis code but I think is going to be
> > straightforward.
> >
> > I am planning to push my code on phabricator in the next days.
> >
> > thanks,
> > Chris
> >
> >
> > On Fri, Jun 5, 2015 at 3:45 AM, Adve, Vikram Sadanand <
> vadve at illinois.edu>
> > wrote:
> >>
> >> Christos,
> >>
> >> We would be very interested in learning more about this.
> >>
> >> In my group, we (Prakalp Srivastava, Maria Kotsifakou and I) have
been
> >> working on LLVM extensions to make it easier to target a wide
range of
> >> accelerators in a heterogeneous mobile device, such as
Qualcomm's
> Snapdragon
> >> and other APUs.  Our approach has been to (a) add better
abstractions of
> >> parallelism to the LLVM instruction set that can be mapped down to
a
> wide
> >> range of parallel hardware accelerators; and (b) to develop
optimizing
> >> "back-end" translators to generate efficient code for
the accelerators
> from
> >> the extended IR.
> >>
> >> So far, we have been targeting GPUs and vector hardware, but
semi-custom
> >> (programmable) accelerators are our next goal.  We have discussed
DSPs
> as a
> >> valuable potential goal as well.
> >>
> >> Judging from the brief information here, I'm guessing that our
projects
> >> have been quite complementary.  We have not worked on the
extraction
> passes,
> >> scheduling, or other run-time components you mention and would be
happy
> to
> >> use an existing solution for those.  Our hope is that the IR
extensions
> and
> >> translators will give your schedulers greater flexibility to
retarget
> the
> >> extracted code components to different accelerators.
> >>
> >> --Vikram S. Adve
> >> Visiting Professor, School of Computer and Communication Sciences,
EPFL
> >> Professor, Department of Computer Science
> >> University of Illinois at Urbana-Champaign
> >> vadve at illinois.edu
> >> http://llvm.org
> >>
> >>
> >>
> >>
> >> On Jun 5, 2015, at 3:18 AM, llvmdev-request at cs.uiuc.edu wrote:
> >>
> >> > Date: Thu, 4 Jun 2015 17:35:25 -0700
> >> > From: Christos Margiolas <chrmargiolas at gmail.com>
> >> > To: LLVM Developers Mailing List <llvmdev at
cs.uiuc.edu>
> >> > Subject: [LLVMdev] Supporting heterogeneous computing in
llvm.
> >> > Message-ID:
> >> >
> >> > <CAC3KUCx0mpBrnrGjDVxQzxtBpnJXtw3herZ_E2pQoSqSyMNsKA at
mail.gmail.com>
> >> > Content-Type: text/plain; charset="utf-8"
> >> >
> >> > Hello All,
> >> >
> >> > The last two months I have been working on the design and
> implementation
> >> > of
> >> > a heterogeneous execution engine for LLVM. I started this
project as
> an
> >> > intern at the Qualcomm Innovation Center and I believe it can
be
> useful
> >> > to
> >> > different people and use cases. I am planning to share more
details
> and
> >> > a
> >> > set of patches in the next
> >> > days. However, I would first like to see if there is an
interest for
> >> > this.
> >> >
> >> > The project is about providing compiler and runtime support
for the
> >> > automatic and transparent offloading of loop or function
workloads to
> >> > accelerators.
> >> >
> >> > It is composed of the following:
> >> > a) Compiler and Transformation Passes for extracting loops or
> functions
> >> > for
> >> > offloading.
> >> > b) A runtime library that handles scheduling, data sharing
and
> coherency
> >> > between the
> >> > host and accelerator sides.
> >> > c) A modular codebase and design. Adaptors specialize the
code
> >> > transformations for the target accelerators. Runtime plugins
manage
> the
> >> > interaction with the different accelerator environments.
> >> >
> >> > So far, this work so far supports the Qualcomm DSP
accelerator  but I
> am
> >> > planning to extend it to support OpenCL accelerators. I have
also
> >> > developed
> >> > a debug port where I can test the passes and the runtime
without
> >> > requiring
> >> > an accelerator.
> >> >
> >> >
> >> > The project is still in early R&D stage and I am looking
forward for
> >> > feedback and to gauge  the interest level. I am willing to
continue
> >> > working
> >> > on this as an open source project and bring it to the right
shape so
> it
> >> > can
> >> > be merged with the LLVM tree.
> >> >
> >> >
> >> > Regards,
> >> > Chris
> >> >
> >> > P.S. I intent to join the llvm social in Bay Area tonight and
I will
> be
> >> > more than happy to talk about it.
> >> > -------------- next part --------------
> >> > An HTML attachment was scrubbed...
> >> > URL:
> >> > <
>
http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20150604/289e4438/attachment-0001.html
> >
> >>
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150609/91c54020/attachment.html>

C Bergström

2015-Jun-09 13:14 UTC

head link

[LLVMdev] Supporting heterogeneous computing in llvm.

When you're detecting which region of code to offload - can you also
detect the difference between a computationally bound kernel and a
memory bound one?

Sergey Ostanevich

2015-Jun-09 15:40 UTC

head link

[LLVMdev] Supporting heterogeneous computing in llvm.

Chirs,
>From those two replies you sent I can't get the point.Here you say modules compiled independently for host and target
(accelerator) and that IR is not a good place to tackle the
architecture differences:

On Tue, Jun 9, 2015 at 10:17 AM, Christos Margiolas
<chrmargiolas at gmail.com> wrote:> In fact, I have two modules:
> a) the Host one
> b) the Accelerator one
>
> Each one gets compiled independently. The runtime takes care of the
> offloading operations and loads the accelerator code. Imagine that you want
> to compile for amd64 and nvidia ptx. You cannot do it in a single module
and
> even if you support it, it is gonna become scary. How are you gonna handle
> architecture differences that affect the IR in a nice way? e.g. pointer
> size, stack alignment and much more...
>
And 11 minutes later you write:

On Tue, Jun 9, 2015 at 10:28 AM, Christos Margiolas
<chrmargiolas at gmail.com> wrote:> Hello,
>
> I can see some fundamental differences between this work and my work.
> However, I think they are more complementary than "competitive".
My work
> handles code extraction for offloading (in IR level) and offloading control
[...]
> The scheme you refer could be supported on the top of my infrastructure. I
> personally believe that code extraction and transformations for offloading
> should be done in IR and not in source level. The reason is that in IR
level
> you have enough information about your program (e.g. datatypes) and a good
> idea about your target architectures.
>
Can you please clarify the scheme you propose: whether modules are
different from the source level?
How do you plan to address the architecture differences in the IR level?

Regards,
Sergos

llvm dev - Jun 2015 - [LLVMdev] Supporting heterogeneous computing in llvm.

[LLVMdev] Supporting heterogeneous computing in llvm.

[LLVMdev] Supporting heterogeneous computing in llvm.

[LLVMdev] Supporting heterogeneous computing in llvm.

[LLVMdev] Supporting heterogeneous computing in llvm.

[LLVMdev] Supporting heterogeneous computing in llvm.

[LLVMdev] Supporting heterogeneous computing in llvm.