Adve, Vikram Sadanand
2015-Jun-05 10:45 UTC
[LLVMdev] Supporting heterogeneous computing in llvm.
Christos, We would be very interested in learning more about this. In my group, we (Prakalp Srivastava, Maria Kotsifakou and I) have been working on LLVM extensions to make it easier to target a wide range of accelerators in a heterogeneous mobile device, such as Qualcomm's Snapdragon and other APUs. Our approach has been to (a) add better abstractions of parallelism to the LLVM instruction set that can be mapped down to a wide range of parallel hardware accelerators; and (b) to develop optimizing "back-end" translators to generate efficient code for the accelerators from the extended IR. So far, we have been targeting GPUs and vector hardware, but semi-custom (programmable) accelerators are our next goal. We have discussed DSPs as a valuable potential goal as well. Judging from the brief information here, I'm guessing that our projects have been quite complementary. We have not worked on the extraction passes, scheduling, or other run-time components you mention and would be happy to use an existing solution for those. Our hope is that the IR extensions and translators will give your schedulers greater flexibility to retarget the extracted code components to different accelerators. --Vikram S. Adve Visiting Professor, School of Computer and Communication Sciences, EPFL Professor, Department of Computer Science University of Illinois at Urbana-Champaign vadve at illinois.edu http://llvm.org On Jun 5, 2015, at 3:18 AM, llvmdev-request at cs.uiuc.edu wrote:> Date: Thu, 4 Jun 2015 17:35:25 -0700 > From: Christos Margiolas <chrmargiolas at gmail.com> > To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> > Subject: [LLVMdev] Supporting heterogeneous computing in llvm. > Message-ID: > <CAC3KUCx0mpBrnrGjDVxQzxtBpnJXtw3herZ_E2pQoSqSyMNsKA at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hello All, > > The last two months I have been working on the design and implementation of > a heterogeneous execution engine for LLVM. I started this project as an > intern at the Qualcomm Innovation Center and I believe it can be useful to > different people and use cases. I am planning to share more details and a > set of patches in the next > days. However, I would first like to see if there is an interest for this. > > The project is about providing compiler and runtime support for the > automatic and transparent offloading of loop or function workloads to > accelerators. > > It is composed of the following: > a) Compiler and Transformation Passes for extracting loops or functions for > offloading. > b) A runtime library that handles scheduling, data sharing and coherency > between the > host and accelerator sides. > c) A modular codebase and design. Adaptors specialize the code > transformations for the target accelerators. Runtime plugins manage the > interaction with the different accelerator environments. > > So far, this work so far supports the Qualcomm DSP accelerator but I am > planning to extend it to support OpenCL accelerators. I have also developed > a debug port where I can test the passes and the runtime without requiring > an accelerator. > > > The project is still in early R&D stage and I am looking forward for > feedback and to gauge the interest level. I am willing to continue working > on this as an open source project and bring it to the right shape so it can > be merged with the LLVM tree. > > > Regards, > Chris > > P.S. I intent to join the llvm social in Bay Area tonight and I will be > more than happy to talk about it. > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: <http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20150604/289e4438/attachment-0001.html>
Christos Margiolas
2015-Jun-06 11:24 UTC
[LLVMdev] Supporting heterogeneous computing in llvm.
Hello, Thank you a lot for the feedback. I believe that the heterogeneous engine should be strongly connected with parallelization and vectorization efforts. Most of the accelerators are parallel architectures where having efficient parallelization and vectorization can be critical for performance. I am interested in these efforts and I hope that my code can help you managing the offloading operations. Your LLVM instruction set extensions may require some changes in the analysis code but I think is going to be straightforward. I am planning to push my code on phabricator in the next days. thanks, Chris On Fri, Jun 5, 2015 at 3:45 AM, Adve, Vikram Sadanand <vadve at illinois.edu> wrote:> Christos, > > We would be very interested in learning more about this. > > In my group, we (Prakalp Srivastava, Maria Kotsifakou and I) have been > working on LLVM extensions to make it easier to target a wide range of > accelerators in a heterogeneous mobile device, such as Qualcomm's > Snapdragon and other APUs. Our approach has been to (a) add better > abstractions of parallelism to the LLVM instruction set that can be mapped > down to a wide range of parallel hardware accelerators; and (b) to develop > optimizing "back-end" translators to generate efficient code for the > accelerators from the extended IR. > > So far, we have been targeting GPUs and vector hardware, but semi-custom > (programmable) accelerators are our next goal. We have discussed DSPs as a > valuable potential goal as well. > > Judging from the brief information here, I'm guessing that our projects > have been quite complementary. We have not worked on the extraction > passes, scheduling, or other run-time components you mention and would be > happy to use an existing solution for those. Our hope is that the IR > extensions and translators will give your schedulers greater flexibility to > retarget the extracted code components to different accelerators. > > --Vikram S. Adve > Visiting Professor, School of Computer and Communication Sciences, EPFL > Professor, Department of Computer Science > University of Illinois at Urbana-Champaign > vadve at illinois.edu > http://llvm.org > > > > > On Jun 5, 2015, at 3:18 AM, llvmdev-request at cs.uiuc.edu wrote: > > > Date: Thu, 4 Jun 2015 17:35:25 -0700 > > From: Christos Margiolas <chrmargiolas at gmail.com> > > To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> > > Subject: [LLVMdev] Supporting heterogeneous computing in llvm. > > Message-ID: > > < > CAC3KUCx0mpBrnrGjDVxQzxtBpnJXtw3herZ_E2pQoSqSyMNsKA at mail.gmail.com> > > Content-Type: text/plain; charset="utf-8" > > > > Hello All, > > > > The last two months I have been working on the design and implementation > of > > a heterogeneous execution engine for LLVM. I started this project as an > > intern at the Qualcomm Innovation Center and I believe it can be useful > to > > different people and use cases. I am planning to share more details and a > > set of patches in the next > > days. However, I would first like to see if there is an interest for > this. > > > > The project is about providing compiler and runtime support for the > > automatic and transparent offloading of loop or function workloads to > > accelerators. > > > > It is composed of the following: > > a) Compiler and Transformation Passes for extracting loops or functions > for > > offloading. > > b) A runtime library that handles scheduling, data sharing and coherency > > between the > > host and accelerator sides. > > c) A modular codebase and design. Adaptors specialize the code > > transformations for the target accelerators. Runtime plugins manage the > > interaction with the different accelerator environments. > > > > So far, this work so far supports the Qualcomm DSP accelerator but I am > > planning to extend it to support OpenCL accelerators. I have also > developed > > a debug port where I can test the passes and the runtime without > requiring > > an accelerator. > > > > > > The project is still in early R&D stage and I am looking forward for > > feedback and to gauge the interest level. I am willing to continue > working > > on this as an open source project and bring it to the right shape so it > can > > be merged with the LLVM tree. > > > > > > Regards, > > Chris > > > > P.S. I intent to join the llvm social in Bay Area tonight and I will be > > more than happy to talk about it. > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: < > http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20150604/289e4438/attachment-0001.html > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150606/bde45869/attachment.html>
On Sat, Jun 6, 2015 at 6:24 PM, Christos Margiolas <chrmargiolas at gmail.com> wrote:> Hello, > > Thank you a lot for the feedback. I believe that the heterogeneous engine > should be strongly connected with parallelization and vectorization efforts. > Most of the accelerators are parallel architectures where having efficient > parallelization and vectorization can be critical for performance. > > I am interested in these efforts and I hope that my code can help you > managing the offloading operations. Your LLVM instruction set extensions may > require some changes in the analysis code but I think is going to be > straightforward. > > I am planning to push my code on phabricator in the next days.If you're doing the extracting at the loop and llvm ir level - why would you need to modify the IR? Wouldn't the target level lowering happen later? How are you actually determining to offload? Is this tied to directives or using heuristics+some set of restrictions? Lastly, are you handling 2 targets in the same module or end up emitting 2 modules and dealing with recombining things later..
Sergey Ostanevich
2015-Jun-08 11:27 UTC
[LLVMdev] Supporting heterogeneous computing in llvm.
Chirs, Have you seen an offloading infrastructure design proposal at http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084986.html ? It relies on the long-standing OpenMP standard with recent updates to support the heterogenous computations. Could you please review it and comment on how it fits to your needs? It's not quite clear from your proposal what source language standard do you plat to support - you just metion that OpenCL will be one of your backends, as far as I got it. What's your plan on sources - C/C++/FORTRAN? How would you control the offloading, data transfer, scheduling and so on? Whether it will be new language constructs, similar to prallel_for in Cilk Plus, or will it be pragma-based like in OpenMP or OpenACC? The design I mentioned above has an operable implementation fon NVIDIA target at the https://github.com/clang-omp/llvm_trunk https://github.com/clang-omp/clang_trunk with runtime implemented at https://github.com/clang-omp/libomptarget you're welcome to try it out, if you have an appropriate device. Regards, Sergos On Sat, Jun 6, 2015 at 2:24 PM, Christos Margiolas <chrmargiolas at gmail.com> wrote:> Hello, > > Thank you a lot for the feedback. I believe that the heterogeneous engine > should be strongly connected with parallelization and vectorization efforts. > Most of the accelerators are parallel architectures where having efficient > parallelization and vectorization can be critical for performance. > > I am interested in these efforts and I hope that my code can help you > managing the offloading operations. Your LLVM instruction set extensions may > require some changes in the analysis code but I think is going to be > straightforward. > > I am planning to push my code on phabricator in the next days. > > thanks, > Chris > > > On Fri, Jun 5, 2015 at 3:45 AM, Adve, Vikram Sadanand <vadve at illinois.edu> > wrote: >> >> Christos, >> >> We would be very interested in learning more about this. >> >> In my group, we (Prakalp Srivastava, Maria Kotsifakou and I) have been >> working on LLVM extensions to make it easier to target a wide range of >> accelerators in a heterogeneous mobile device, such as Qualcomm's Snapdragon >> and other APUs. Our approach has been to (a) add better abstractions of >> parallelism to the LLVM instruction set that can be mapped down to a wide >> range of parallel hardware accelerators; and (b) to develop optimizing >> "back-end" translators to generate efficient code for the accelerators from >> the extended IR. >> >> So far, we have been targeting GPUs and vector hardware, but semi-custom >> (programmable) accelerators are our next goal. We have discussed DSPs as a >> valuable potential goal as well. >> >> Judging from the brief information here, I'm guessing that our projects >> have been quite complementary. We have not worked on the extraction passes, >> scheduling, or other run-time components you mention and would be happy to >> use an existing solution for those. Our hope is that the IR extensions and >> translators will give your schedulers greater flexibility to retarget the >> extracted code components to different accelerators. >> >> --Vikram S. Adve >> Visiting Professor, School of Computer and Communication Sciences, EPFL >> Professor, Department of Computer Science >> University of Illinois at Urbana-Champaign >> vadve at illinois.edu >> http://llvm.org >> >> >> >> >> On Jun 5, 2015, at 3:18 AM, llvmdev-request at cs.uiuc.edu wrote: >> >> > Date: Thu, 4 Jun 2015 17:35:25 -0700 >> > From: Christos Margiolas <chrmargiolas at gmail.com> >> > To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> >> > Subject: [LLVMdev] Supporting heterogeneous computing in llvm. >> > Message-ID: >> > >> > <CAC3KUCx0mpBrnrGjDVxQzxtBpnJXtw3herZ_E2pQoSqSyMNsKA at mail.gmail.com> >> > Content-Type: text/plain; charset="utf-8" >> > >> > Hello All, >> > >> > The last two months I have been working on the design and implementation >> > of >> > a heterogeneous execution engine for LLVM. I started this project as an >> > intern at the Qualcomm Innovation Center and I believe it can be useful >> > to >> > different people and use cases. I am planning to share more details and >> > a >> > set of patches in the next >> > days. However, I would first like to see if there is an interest for >> > this. >> > >> > The project is about providing compiler and runtime support for the >> > automatic and transparent offloading of loop or function workloads to >> > accelerators. >> > >> > It is composed of the following: >> > a) Compiler and Transformation Passes for extracting loops or functions >> > for >> > offloading. >> > b) A runtime library that handles scheduling, data sharing and coherency >> > between the >> > host and accelerator sides. >> > c) A modular codebase and design. Adaptors specialize the code >> > transformations for the target accelerators. Runtime plugins manage the >> > interaction with the different accelerator environments. >> > >> > So far, this work so far supports the Qualcomm DSP accelerator but I am >> > planning to extend it to support OpenCL accelerators. I have also >> > developed >> > a debug port where I can test the passes and the runtime without >> > requiring >> > an accelerator. >> > >> > >> > The project is still in early R&D stage and I am looking forward for >> > feedback and to gauge the interest level. I am willing to continue >> > working >> > on this as an open source project and bring it to the right shape so it >> > can >> > be merged with the LLVM tree. >> > >> > >> > Regards, >> > Chris >> > >> > P.S. I intent to join the llvm social in Bay Area tonight and I will be >> > more than happy to talk about it. >> > -------------- next part -------------- >> > An HTML attachment was scrubbed... >> > URL: >> > <http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20150604/289e4438/attachment-0001.html> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >