Christos Margiolas
2015-Jun-05 00:35 UTC
[LLVMdev] Supporting heterogeneous computing in llvm.
Hello All, The last two months I have been working on the design and implementation of a heterogeneous execution engine for LLVM. I started this project as an intern at the Qualcomm Innovation Center and I believe it can be useful to different people and use cases. I am planning to share more details and a set of patches in the next days. However, I would first like to see if there is an interest for this. The project is about providing compiler and runtime support for the automatic and transparent offloading of loop or function workloads to accelerators. It is composed of the following: a) Compiler and Transformation Passes for extracting loops or functions for offloading. b) A runtime library that handles scheduling, data sharing and coherency between the host and accelerator sides. c) A modular codebase and design. Adaptors specialize the code transformations for the target accelerators. Runtime plugins manage the interaction with the different accelerator environments. So far, this work so far supports the Qualcomm DSP accelerator but I am planning to extend it to support OpenCL accelerators. I have also developed a debug port where I can test the passes and the runtime without requiring an accelerator. The project is still in early R&D stage and I am looking forward for feedback and to gauge the interest level. I am willing to continue working on this as an open source project and bring it to the right shape so it can be merged with the LLVM tree. Regards, Chris P.S. I intent to join the llvm social in Bay Area tonight and I will be more than happy to talk about it. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150604/289e4438/attachment.html>
Gerolf Hoflehner
2015-Jun-05 00:57 UTC
[LLVMdev] Supporting heterogeneous computing in llvm.
Hi, I can see even the homogenous variant of this to be useful. Just having the capability of extracting loops and wrapping them into functions and/or modules could help speeding up performance analysis and experiments. It would also help with testing the basic infrastructure + heterogenous environments. Cheers Gerolf> On Jun 4, 2015, at 5:35 PM, Christos Margiolas <chrmargiolas at gmail.com> wrote: > > Hello All, > > The last two months I have been working on the design and implementation of a heterogeneous execution engine for LLVM. I started this project as an intern at the Qualcomm Innovation Center and I believe it can be useful to different people and use cases. I am planning to share more details and a set of patches in the next > days. However, I would first like to see if there is an interest for this. > > The project is about providing compiler and runtime support for the automatic and transparent offloading of loop or function workloads to accelerators. > > It is composed of the following: > a) Compiler and Transformation Passes for extracting loops or functions for offloading. > b) A runtime library that handles scheduling, data sharing and coherency between the > host and accelerator sides. > c) A modular codebase and design. Adaptors specialize the code transformations for the target accelerators. Runtime plugins manage the interaction with the different accelerator environments. > > So far, this work so far supports the Qualcomm DSP accelerator but I am planning to extend it to support OpenCL accelerators. I have also developed a debug port where I can test the passes and the runtime without requiring an accelerator. > > > The project is still in early R&D stage and I am looking forward for feedback and to gauge the interest level. I am willing to continue working on this as an open source project and bring it to the right shape so it can be merged with the LLVM tree. > > > Regards, > Chris > > P.S. I intent to join the llvm social in Bay Area tonight and I will be more than happy to talk about it. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150604/ef28b6e5/attachment.html>
Richard Pennington
2015-Jun-05 01:24 UTC
[LLVMdev] Supporting heterogeneous computing in llvm.
On 06/04/2015 07:35 PM, Christos Margiolas wrote:> Hello All, > > The last two months I have been working on the design and > implementation of a heterogeneous execution engine for LLVM. I started > this project as an intern at the Qualcomm Innovation Center and I > believe it can be useful to different people and use cases. I am > planning to share more details and a set of patches in the next > days. However, I would first like to see if there is an interest for > this. > > The project is about providing compiler and runtime support for the > automatic and transparent offloading of loop or function workloads to > accelerators. > > It is composed of the following: > a) Compiler and Transformation Passes for extracting loops or > functions for offloading.This sounds really cool. I'm thinking about FPGA offloading. -Rich -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150604/f134f79a/attachment.html>
Hi Chris- Are you offloading the vectorizable loops or are you looking at auto-par loops also ? -Thx Dibyendu From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Christos Margiolas Sent: Friday, June 05, 2015 6:05 AM To: LLVM Developers Mailing List Subject: [LLVMdev] Supporting heterogeneous computing in llvm. Hello All, The last two months I have been working on the design and implementation of a heterogeneous execution engine for LLVM. I started this project as an intern at the Qualcomm Innovation Center and I believe it can be useful to different people and use cases. I am planning to share more details and a set of patches in the next days. However, I would first like to see if there is an interest for this. The project is about providing compiler and runtime support for the automatic and transparent offloading of loop or function workloads to accelerators. It is composed of the following: a) Compiler and Transformation Passes for extracting loops or functions for offloading. b) A runtime library that handles scheduling, data sharing and coherency between the host and accelerator sides. c) A modular codebase and design. Adaptors specialize the code transformations for the target accelerators. Runtime plugins manage the interaction with the different accelerator environments. So far, this work so far supports the Qualcomm DSP accelerator but I am planning to extend it to support OpenCL accelerators. I have also developed a debug port where I can test the passes and the runtime without requiring an accelerator. The project is still in early R&D stage and I am looking forward for feedback and to gauge the interest level. I am willing to continue working on this as an open source project and bring it to the right shape so it can be merged with the LLVM tree. Regards, Chris P.S. I intent to join the llvm social in Bay Area tonight and I will be more than happy to talk about it. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150605/1a050033/attachment.html>
Gerolf Hoflehner
2015-Jun-05 22:14 UTC
[LLVMdev] Supporting heterogeneous computing in llvm.
Hi Christos, your idea can certainly go very far, and the capability of extracting and executing one loop nest at a time should enable progress in multiple directions, among them fast performance evaluation of loop transformations and benefits of accelerators. It could also be useful for measuring and tuning off-loading overhead. Cheers Gerolf> On Jun 5, 2015, at 3:01 AM, Christos Margiolas <chrmargiolas at gmail.com> wrote: > > Hi Gerolf, > > LLVM already has a Utility class for extracting loops to functions. However extracting code to a different module is something that is missing I think. Supporting a homogeneous variant is something I have considered and in fact the debug plugin I have written is a homogeneous variant. I "offload" to the same architecture. The idea could be further extended in a scheme where the offloaded code is loaded in a third process and the main process performs remote process calls. This would probably require extensive data transfers across the address spaces or a Memory Allocator that operates across the processes by using Inter-process shared memory. > > --chris > > > > On Thu, Jun 4, 2015 at 5:57 PM, Gerolf Hoflehner <ghoflehner at apple.com <mailto:ghoflehner at apple.com>> wrote: > Hi, > > I can see even the homogenous variant of this to be useful. Just having the capability of extracting loops and wrapping them into functions and/or modules could help speeding up performance analysis and experiments. It would also help with testing the basic infrastructure + heterogenous environments. > > > Cheers > Gerolf > >> On Jun 4, 2015, at 5:35 PM, Christos Margiolas <chrmargiolas at gmail.com <mailto:chrmargiolas at gmail.com>> wrote: >> >> Hello All, >> >> The last two months I have been working on the design and implementation of a heterogeneous execution engine for LLVM. I started this project as an intern at the Qualcomm Innovation Center and I believe it can be useful to different people and use cases. I am planning to share more details and a set of patches in the next >> days. However, I would first like to see if there is an interest for this. >> >> The project is about providing compiler and runtime support for the automatic and transparent offloading of loop or function workloads to accelerators. >> >> It is composed of the following: >> a) Compiler and Transformation Passes for extracting loops or functions for offloading. >> b) A runtime library that handles scheduling, data sharing and coherency between the >> host and accelerator sides. >> c) A modular codebase and design. Adaptors specialize the code transformations for the target accelerators. Runtime plugins manage the interaction with the different accelerator environments. >> >> So far, this work so far supports the Qualcomm DSP accelerator but I am planning to extend it to support OpenCL accelerators. I have also developed a debug port where I can test the passes and the runtime without requiring an accelerator. >> >> >> The project is still in early R&D stage and I am looking forward for feedback and to gauge the interest level. I am willing to continue working on this as an open source project and bring it to the right shape so it can be merged with the LLVM tree. >> >> >> Regards, >> Chris >> >> P.S. I intent to join the llvm social in Bay Area tonight and I will be more than happy to talk about it. >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150605/6c61999c/attachment.html>
Christos Margiolas
2015-Jun-06 10:48 UTC
[LLVMdev] Supporting heterogeneous computing in llvm.
Hi Gerolf, Thanks for the interest. I agree that this project may help in different directions and this is the reason I tried to keep it modular and independent of particular use cases. --chris On Fri, Jun 5, 2015 at 3:14 PM, Gerolf Hoflehner <ghoflehner at apple.com> wrote:> Hi Christos, > > your idea can certainly go very far, and the capability of extracting and > executing one loop nest at a time should enable progress in multiple > directions, among them fast performance evaluation of loop transformations > and benefits of accelerators. It could also be useful for measuring and > tuning off-loading overhead. > > Cheers > Gerolf > > On Jun 5, 2015, at 3:01 AM, Christos Margiolas <chrmargiolas at gmail.com> > wrote: > > Hi Gerolf, > > LLVM already has a Utility class for extracting loops to functions. > However extracting code to a different module is something that is missing > I think. Supporting a homogeneous variant is something I have considered > and in fact the debug plugin I have written is a homogeneous variant. I > "offload" to the same architecture. The idea could be further extended in a > scheme where the offloaded code is loaded in a third process and the main > process performs remote process calls. This would probably require > extensive data transfers across the address spaces or a Memory Allocator > that operates across the processes by using Inter-process shared memory. > > --chris > > > > On Thu, Jun 4, 2015 at 5:57 PM, Gerolf Hoflehner <ghoflehner at apple.com> > wrote: > >> Hi, >> >> I can see even the homogenous variant of this to be useful. Just having >> the capability of extracting loops and wrapping them into functions and/or >> modules could help speeding up performance analysis and experiments. It >> would also help with testing the basic infrastructure + heterogenous >> environments. >> >> >> Cheers >> Gerolf >> >> On Jun 4, 2015, at 5:35 PM, Christos Margiolas <chrmargiolas at gmail.com> >> wrote: >> >> Hello All, >> >> The last two months I have been working on the design and implementation >> of a heterogeneous execution engine for LLVM. I started this project as an >> intern at the Qualcomm Innovation Center and I believe it can be useful to >> different people and use cases. I am planning to share more details and a >> set of patches in the next >> days. However, I would first like to see if there is an interest for >> this. >> >> The project is about providing compiler and runtime support for the >> automatic and transparent offloading of loop or function workloads to >> accelerators. >> >> It is composed of the following: >> a) Compiler and Transformation Passes for extracting loops or functions >> for offloading. >> b) A runtime library that handles scheduling, data sharing and coherency >> between the >> host and accelerator sides. >> c) A modular codebase and design. Adaptors specialize the code >> transformations for the target accelerators. Runtime plugins manage the >> interaction with the different accelerator environments. >> >> So far, this work so far supports the Qualcomm DSP accelerator but I am >> planning to extend it to support OpenCL accelerators. I have also developed >> a debug port where I can test the passes and the runtime without requiring >> an accelerator. >> >> >> The project is still in early R&D stage and I am looking forward for >> feedback and to gauge the interest level. I am willing to continue working >> on this as an open source project and bring it to the right shape so it can >> be merged with the LLVM tree. >> >> >> Regards, >> Chris >> >> P.S. I intent to join the llvm social in Bay Area tonight and I will be >> more than happy to talk about it. >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150606/a16ff034/attachment.html>
Christos Margiolas
2015-Jun-06 10:49 UTC
[LLVMdev] Supporting heterogeneous computing in llvm.
Hi Dibyendu, The design is quite modular. It can support offloading for serial execution, vectorized execution or parallel execution. I will provide more details soon. --chris On Thu, Jun 4, 2015 at 10:17 PM, Das, Dibyendu <Dibyendu.Das at amd.com> wrote:> Hi Chris- > > > > Are you offloading the vectorizable loops or are you looking at auto-par > loops also ? > > > > -Thx > > Dibyendu > > > > > > > > *From:* llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] *On > Behalf Of *Christos Margiolas > *Sent:* Friday, June 05, 2015 6:05 AM > *To:* LLVM Developers Mailing List > *Subject:* [LLVMdev] Supporting heterogeneous computing in llvm. > > > > Hello All, > > > > The last two months I have been working on the design and implementation > of a heterogeneous execution engine for LLVM. I started this project as an > intern at the Qualcomm Innovation Center and I believe it can be useful to > different people and use cases. I am planning to share more details and a > set of patches in the next > > days. However, I would first like to see if there is an interest for this. > > > > The project is about providing compiler and runtime support for the > automatic and transparent offloading of loop or function workloads to > accelerators. > > > > It is composed of the following: > > a) Compiler and Transformation Passes for extracting loops or functions > for offloading. > > b) A runtime library that handles scheduling, data sharing and coherency > between the > > host and accelerator sides. > > c) A modular codebase and design. Adaptors specialize the code > transformations for the target accelerators. Runtime plugins manage the > interaction with the different accelerator environments. > > > > So far, this work so far supports the Qualcomm DSP accelerator but I am > planning to extend it to support OpenCL accelerators. I have also developed > a debug port where I can test the passes and the runtime without requiring > an accelerator. > > > > > > The project is still in early R&D stage and I am looking forward for > feedback and to gauge the interest level. I am willing to continue working > on this as an open source project and bring it to the right shape so it can > be merged with the LLVM tree. > > > > > > Regards, > > Chris > > > > P.S. I intent to join the llvm social in Bay Area tonight and I will be > more than happy to talk about it. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150606/9d31e424/attachment.html>
Christos Margiolas
2015-Jun-06 10:50 UTC
[LLVMdev] Supporting heterogeneous computing in llvm.
Hi Richard, Having an OpenCL plugin would simplify the use of both GPUs and FPGAs. However, I am not sure if programming FPGAs with OpenCL is mature enough. --Chris On Thu, Jun 4, 2015 at 6:24 PM, Richard Pennington <rich at pennware.com> wrote:> On 06/04/2015 07:35 PM, Christos Margiolas wrote: > > Hello All, > > The last two months I have been working on the design and implementation > of a heterogeneous execution engine for LLVM. I started this project as an > intern at the Qualcomm Innovation Center and I believe it can be useful to > different people and use cases. I am planning to share more details and a > set of patches in the next > days. However, I would first like to see if there is an interest for this. > > The project is about providing compiler and runtime support for the > automatic and transparent offloading of loop or function workloads to > accelerators. > > It is composed of the following: > a) Compiler and Transformation Passes for extracting loops or functions > for offloading. > > This sounds really cool. I'm thinking about FPGA offloading. > > -Rich > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150606/7144ba9e/attachment.html>